7,981 research outputs found
CORE: Augmenting Regenerating-Coding-Based Recovery for Single and Concurrent Failures in Distributed Storage Systems
Data availability is critical in distributed storage systems, especially when
node failures are prevalent in real life. A key requirement is to minimize the
amount of data transferred among nodes when recovering the lost or unavailable
data of failed nodes. This paper explores recovery solutions based on
regenerating codes, which are shown to provide fault-tolerant storage and
minimum recovery bandwidth. Existing optimal regenerating codes are designed
for single node failures. We build a system called CORE, which augments
existing optimal regenerating codes to support a general number of failures
including single and concurrent failures. We theoretically show that CORE
achieves the minimum possible recovery bandwidth for most cases. We implement
CORE and evaluate our prototype atop a Hadoop HDFS cluster testbed with up to
20 storage nodes. We demonstrate that our CORE prototype conforms to our
theoretical findings and achieves recovery bandwidth saving when compared to
the conventional recovery approach based on erasure codes.Comment: 25 page
- …